Performance aspects of sparse matrix-vector multiplication

نویسنده

Ivan Šimecek

چکیده

Sparse matrix-vector multiplication (shortly SpM×V) is an important building block in algorithms solving sparse systems of linear equations, e.g., FEM. Due to matrix sparsity, the memory access patterns are irregular and utilization of the cache can suffer from low spatial or temporal locality. Approaches to improve the performance of SpM×V are based on matrix reordering and register blocking [1,2], sometimes combined with software-pipelining [3]. Due to its overhead, register blocking achieves good speedups only for a large number of executions of SpM×V with the same matrix A. We have investigated the impact of two simple SW transformation techniques (software-pipelining and loop unrolling) on the performance of SpM×V, and have compared it with several implementation modifications aimed at reducing computational and memory complexity and improving the spatial locality. We investigate performance gains of these modifications on four CPU platforms. Terminology and notation Consider a sparse n×n matrix A with elements n j i , 1 , ; A ij ∈ . The largest distance between nonzero elements in any row is the bandwidth of matrix A and is denoted by B ω , i.e., ) 0 A ; ( min l ij i ≠ = j j ) 0 A ; ( max r ij i ≠ = j j ) 1 + l r ( max i i i B = ω Storage schemes for sparse matrices Compressed sparse row (CSR) format Matrix A is represented by 3 linear arrays A, adr, and ci (see Figure 1). Array A stores the nonzero elements of input matrix A, array adr[1,...,n] contains indexes of the initial nonzero elements of rows of A, and array ci contains column indexes of nonzero elements of A. Hence, the first nonzero element of row j is stored at index adr[j] in array A. Figure 1: The idea of the CSR format. a) A sparse matrix A in dense format. b) The CSR representation of A. Figure 2: The idea of the static L-CSR format. a) A sparse matrix A in dense format. b) The static L-CSR representation of A. Length-sorted CSR (L-CSR) storage format. The main idea is explained in [4]. The data is represented as in the CSR format, but the rows are sorted by length in increasing order. This means that the length of row i is less or equal to the length of row i+1. There are two variants: 1. Static: The rows are physically stored in the sorted order in the CSR format (see Figure 2). 2. Dynamic: The original CSR format is extended with two additional arrays (see Figure 3). Array member[1...n] contains the indexes of the rows after sorting. Array [ ] B begin ω .. 1 contains the indexes into the array member: begin[i] is the index of the first row of length i in the array member. Figure 3: The idea of the dynamic L-CSR format. a) A sparse matrix A in dense format. b) The dynamic L-CSR representation of A.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Hybrid-Parallel Sparse Matrix–Vector Multiplication and Iterative Linear Solvers with the communication library GPI

We present a library of Krylov subspace iterative solvers built over the PGAS-type communication layer GPI. The hybrid pattern is here the appropriate choice to reveal the hierarchical parallelism of clusters with multiand manycore nodes. Our approach includes asynchronous communication and differs in many aspects from the classical one. We first present the GPI-based implementation of the spar...

متن کامل

On improving the performance of sparse matrix-vector multiplication

We analyze single-node performance of sparse matrix-vector multiplication by investigating issues of data locality and ne-grained parallelism. We examine the data-locality characteristics of the compressed-sparse-row representation and consider improvements in locality through matrix permutation. Motivated by potential improvements in ne-grained parallelism, we evaluate modiied sparse-matrix re...

متن کامل

Optimizing Sparse Matrix Vector Multiplication on SMPs

We describe optimizations of sparse matrix-vector multiplication on uniprocessors and SMPs. The optimization techniques include register blocking, cache blocking, and matrix reordering. We focus on optimizations that improve performance on SMPs, in particular, matrix reordering implemented using two diierent graph algorithms. We present a performance study of this algorithmic kernel, showing ho...

متن کامل

Sparse Matrix Multiplication on CAM Based Accelerator

Sparse matrix multiplication is an important component of linear algebra computations. In this paper, an architecture based on Content Addressable Memory (CAM) and Resistive Content Addressable Memory (ReCAM) is proposed for accelerating sparse matrix by sparse vector and matrix multiplication in CSR format. Using functional simulation, we show that the proposed ReCAM-based accelerator exhibits...

متن کامل

Vector ISA Extension for Sparse Matrix-Vector Multiplication

In this paper we introduce a vector ISA extension to facilitate sparse matrix manipulation on vector processors (VPs). First we introduce a new Block Based Compressed Storage (BBCS) format for sparse matrix representation and a Block-wise Sparse Matrix-Vector Multiplication approach. Additionally, we propose two vector instructions, Multiple Inner Product and Accumulate (MIPA) and LoaD Section ...

متن کامل

Breaking the performance bottleneck of sparse matrix-vector multiplication on SIMD processors

The low utilization of SIMD units and memory bandwidth is the main performance bottleneck on SIMD processors for sparse matrix-vector multiplication (SpMV), which is one of the most important kernels in many scientific and engineering applications. This paper proposes a hybrid optimization method to break the performance bottleneck of SpMV on SIMD processors. The method includes a new sparse ma...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2007

Performance aspects of sparse matrix-vector multiplication

نویسنده

چکیده

منابع مشابه

Hybrid-Parallel Sparse Matrix–Vector Multiplication and Iterative Linear Solvers with the communication library GPI

On improving the performance of sparse matrix-vector multiplication

Optimizing Sparse Matrix Vector Multiplication on SMPs

Sparse Matrix Multiplication on CAM Based Accelerator

Vector ISA Extension for Sparse Matrix-Vector Multiplication

Breaking the performance bottleneck of sparse matrix-vector multiplication on SIMD processors

عنوان ژورنال:

اشتراک گذاری